Multi-Layer Perception model with Elastic Grey Wolf Optimization to predict student achievement

This study proposes a Grey Wolf Optimization (GWO) variant named Elastic Grey Wolf Optimization algorithm (EGWO) with shrinking, resilient surrounding, and weighted candidate mechanisms. Then, the proposed EGWO is used to optimize the weights and biases of Multi-Layer Perception (MLP), and the EGWO-MLP model for predicting student achievement is thus obtained. The training and verification of the EGWO-MLP prediction model are conducted based on the thirty attributes from the University of California (UCI) Machine Learning Repository dataset’s student performance dataset, including family features and personal characteristics. For the Mathematics (Mat.) subject achievement prediction, the EGWO-MLP model outperforms one model’s prediction accuracy, and the standard deviation possesses the stable ability to predict student achievement. And for the Portuguese (Por.) subject, the EGWO-MLP outperforms three models’ Mathematics (Mat.) subject achievement prediction through the training process and takes first place through the testing process. The results show that the EGWO-MLP model has made fewer test errors, indicating that EGWO can effectively feedback weights and biases due to the strong exploration and local stagnation avoidance. And the EGWO-MLP model is feasible for predicting student achievement. The study can provide reference for improving school teaching programs and enhancing teachers’ teaching quality and students’ learning effect.


Introduction
Education refers to school education organized by particular organizations from a narrow view. And from a broad perspective, it relates to social practice activities that affect people's physical and mental development [1]. According to school conditions and professional titles, it aims to educate and cultivate cognitive development in a planned and organized way, teach people with existing experience and knowledge [2,3], explain various phenomena, problems, or behavior [4], and improve their practical ability. It is fundamental to recognize and treat things with people's relatively mature or rational thinking. which can provide an additional confidence measure and improve the acceptability of prediction [36]. Different principal leaders impact student performance as a significant force in the school. Wu et al. conduct a Multivariate meta-meta-analysis to investigate the relationship between the central leadership and student achievement [37]. Based on the DEA model and Bootstrap method, Masci et al. measure the impact of school (district) size, management practice, and principal leaders' characteristics on student groups through reading and mathematics standardized test scores [38]. The experimental results show that the composition of school subjects mainly affects students' reading efficiency, and management practice mainly affects students' efficiency of mathematics. Scholars choose various characteristics to predict student performance. Many higher education institutions try to understand student factors to improve the quality of education. Students' semantic trajectory is tested by dynamic time distortion, hierarchical clustering, and variance analysis. The experimental results show that semantic trajectory is a relevant factor affecting student performance [39]. Silva et al. analyze family characteristics, values, beliefs, expectations and family support, self-efficacy, goal progress, and academic achievement [40]. The results show that families affect academic performance through academic self-efficacy and views on the progress of educational goals. However, the information provided by self-efficacy has less impact but is more related to the support of material resources. Sarfraz et al. study the factors affecting the performance of business school students during the COVID-19 pandemic: assessing students' views and preferences, the impact of blended learning (BL) setting on students' academic performance, and studying the relationship between unified theory of acceptance and use of the technology of UTAUT and students' academic performance [41]. Javadizadeh et al. study the impact of class structure, teaching style, and class environment on students' classroom realization, and draw lessons from the self-determination theory to assume the relationship between SCARF (status, certainty, autonomy, relatedness, fairness) elements, students' internal motivation, and classroom performance [42]. This study has important guiding significance for improving students' enthusiasm.
Some scholars also choose deep learning techniques to predict student grades, which can be summarized as follows. Rivas et al., to understand the influencing factors of college students, determine the critical factors of student performance through the number of visits to available resources, based on the tree model and different types of ANNs [43]. Li et al. propose a Multi-View Hypergraph Neural Network (MVHGNN) for predicting students' academic performance, which uses hypergraphs to construct high-order relations among students. And a Cascade Attention Transformer (CAT) module is introduced to mine the weight of different behaviors by the self-attention mechanism. The experimental results demonstrate that the MVHGNN method outperforms the state-of-the-art ones evaluated on real campus student behavioral datasets [44]. Bertolini et al. utilize bootstrapping to examine performance variability among five data mining methods (DMMs) and four filter preprocessing feature selection techniques for forecasting course grades for 3225 students enrolled in an undergraduate biology class [45]. Wu et al. propose a novel knowledge tracing model based on an exercise session graph, named session graph-based knowledge tracing (SGKT). The session graph models the students' answering process. And a relationship graph models the relationship between exercises and skills. The experimental results demonstrate the model can outperform some existing baseline methods conducted on three publicly available datasets [46]. Pallathadka et al. analyze the ability of machine learning, such as Naive Bayes, ID3, C4.5, and SVM to predict the students' performance in future tests. And the above methods are evaluated by criteria like accuracy and error rate by the UCI machinery student performance data set [47]. Tomasevic et al. accomplish the task of student exam performance prediction, i.e., discovering students at a "high risk" of dropping out from the course by providing a comprehensive analysis and comparison of state-of-the-art supervised machine learning techniques and predicting their future achievements, such as instance, final exam scores [48].
Although scholars have adopted different methods to study the factors affecting student performance and achievement [49][50][51], and to predict final performance [52,53]. There are few studies conducted on the impact of students' characteristics and family factors on their achievement [54,55].

Ethics statement
In this thesis, the standard University of California (UCI) Machine Learning Repository dataset (https://archive.ics.uci.edu/ml/datasets/Student+Performance) is selected in the simulation experiment. The standard UCI dataset is usually used as a general dataset and often appears in most papers or studies, and the original data are provided on the official website. The Student Performance Data Set introduced in the experimental dataset and environment section are obtained from the official website, which is from two Portuguese secondary schools collected through reports and questionnaires on the performance of students in secondary education. It does not involve human participants, human specimens or tissue, vertebrate animals or cephalopods, vertebrate embryos or tissues, and field research.

Grey Wolf Optimizer
Grey Wolf Optimizer (GWO) is a swarm intelligence algorithm that mimics the hunting behavior of wolves [28]. The superior performance of this algorithm benefits from the wolf herd hierarchy mechanism. Among wolves, α, β, and δ wolves are the three primary wolves. The rest are named ω wolves, in the lowest class, to attack the prey. GWO algorithm incorporates two primary operations: (1) surrounding the prey and (2) hunting the prey. The whole process of the algorithm is as shown in the Algorithm 1.
where t is the current number of iteration; x p (t) is the location of the prey (equivalent to α, β, δ and ω), x(t) and x(t + 1) are wolf locations of the t th and (t + 1) th iteration. D is the distance between the prey and the wolf. A and C are calculated by Eqs (3) and (4).
where a is reduced linearly from 2 to 0 along with the iterations, and r 1 and r 2 are random values in the range of [0, 1]. Operation 2: Hunting the prey. The α wolf leads the whole process. The final position is to update Eqs (5) to (11) at any position in the circle. The ω wolves randomly update around the prey, and α, β, and δ wolves evaluate the prey location.
Algorithm 1: The basic Grey Wolf Optimization Algorithm Step 1: Initialize the grey wolf population: X i (i = 1,2,. . .,n); Step 2: Initialize the parameters: a, A and C; Step 3: Calculate the fitness value of each wolf: Step 4: While t < t max For each wolf Upate the current position based on the Eqs (1) to (11); End For End While Step 5: Return X α .

The variant of the GWO
Since the GWO was proposed, it has attracted the attention of many scholars. In the past two years (2021 and 2022), many scholars have proposed a variety of variants, some of which are as follows:

Multi-Layer Perceptron (MLP)
Multi-Layer Perceptron (MLP) is an artificial neural network with a forward structure, which maps a set of input vectors to a group of output vectors [66,67]. MLP (as shown in Fig  2) can be regarded as a directed graph composed of multiple node layers, and each layer connects to the next layer. Each node is a neuron (or processing unit) with a non-linear activation function. The basic structure of multi-layer perception consists of three layers: the first input layer, the middle hidden layer, and the last output layer. The input elements and weights results feed to the summation node with neuron bias. The primary calculation process is as follows: (1) The weighted sum of the inputs can be calculated by Eq (12).
ðW ij X i Þ À y j ; j ¼ 1; 2; :: where n is the number of the input nodes, W ij indicates the weight linking the i th input layer node and the j th hidden layer node, and X i presents the i th input.
(2) The hidden nodes output can be calculated by Eq (13).
ðw jk S j Þ À y 0 k ; k ¼ 1; 2::::m ð14Þ where w jk is the weight connecting the j th hidden node with the k th output node, θ j and y 0 k are the biases of the j th hidden node and k th output node.

Elastic Grey Wolf Optimization algorithm (EGWO)
To improve the performance of the basic GWO, the shrinking, resilient surrounding, and weighted candidate mechanisms are introduced to remedy the local stagnation and premature convergence deficiencies of the proposed Elastic Grey Wolf Optimization algorithm (EGWO).
Shrinking mechanism. Cauchy distribution is a continuous probability distribution in which mathematical expectation does not exist. When the random variable satisfies its probability density function, the variable obeys the Cauchy distribution. To prevent the wolf position from falling into local stagnation, the parameters A and C are updated, inspired by Cauchy distribution, to propose a shrinking mechanism. The parameters A and C are computed by Eqs (3) and (4), and the random constant r 1 and r 2 are updated by Eqs (16) and (17). Parameter a can be calculated by Eq (18).
Resilient surrounding mechanism. The core of the GWO algorithm is the movement and location of three α, β, and δ wolves. The original location updating strategy can not effectively solve high-dimensional and complex problems. It is difficult to find the best solution due to the local stagnation and premature convergence when solving the above complex real world problems. Therefore, introducing the resilient surrounding mechanism can overcome the above weakness. The position of α, β, and δ wolves can be updated by Eqs (19)- (22).
Weighted candidate mechanism. In the basic GWO, the α wolf is used for hunting the prey, leading to local stagnation. Therefore, introducing the weighted candidate mechanism can avoid the above problem. First, the weighted coefficient can be computed by Eq (23) to adjust the step direction and length. Second, candidate wolves prepare to hunt the prey, and positions can be updated by Eqs (24) to (27). The whole process of the algorithm is as shown in Algorithm 2.

Step 4: For every wolf
Set r 1 and r 2 by Eqs (16) and (17); Initialize a according to the Eq (18); Update X α , X β and X δ according to Eqs (20)- (22); Update the weight w 1 to w 3 by Eq (23); Generate the candidate wolves by Eqs (24) to (27); Update the wolf population; End For t t+1 End While Step 5: Return X α .

EGWO-MLP: Student achievement prediction model
The EGWO-MLP prediction model aims to predict student achievement and then determine the essential variables affecting educational success or failure. The hidden layer structure characteristics and dynamic weight parameter adjustment make it more suitable for predicting student final achievement. The prediction of student achievement can be expressed in The model constructions contain input, operation, and output. The process includes normalization processing, determination of input, output and hidden units, training parameters setting, network model creation, calling of activation function, etc. The output is predicting outcomes. If the test sample's output meets the training sample's expectation, the learning ends. If it does not meet the expectation of the training sample, it learns again and adjusts the threshold until meeting the termination conditions. The whole process of EGWO training MLP is shown in

Student data
In this study, we will analyze recent real world data from two Portuguese secondary schools to train and verify the prediction model obtained from the student performance University of California (UCI) Machine Learning Repository dataset (https://archive.ics.uci.edu/ml/ datasets/Student+Performance). This dataset was collected through reports and questionnaires on students' performance in secondary education in two Portuguese schools. The two schools' proportions of the UCI dataset are shown in Fig 6. The data attributes include student achievement, demographic, social, and school-related characteristics and provide two data sets on the performance of two different subjects: Mathematics (Mat.) and Portuguese (Por.) [68]. The dataset contains thirty attributes, and they are shown in Table 1. The second column of the Table 1 shows the names of the thirty attributes and the third column of the Table 1 describes each attribute. The thirty attributes are the input of the prediction model. And for this paper, we set 80% training data and 20% testing data.

Prediction model (EGWO-MLP) parameters setting
The EGWO-MLP prediction model contains an input layer, two hidden layers, and an output layer. The input layer of the EGWO-MLP selects 30 attributes from the student performance UCI dataset, including family features and personal characteristics as the input nodes. The hidden layer sets to 2, the first hidden layer obtains (2 × numbers of input + 1) nodes, and the second hidden layer owns two. In the prediction model of this paper, 30 factors that affect

PLOS ONE
students are used as the input of the model, 2 hidden layers, G1 and G2, are used as hidden nodes, and finally, the student's grades are used as the output of the model.

Experimental environment setting
The experimental environment adopts MATLAB, an advanced technical computing language and interactive environment integrating numerical analysis, data visualization, matrix calculation, and non-linear dynamic modeling. The experiment codes in Matlab R2015b environment under the Windows 10 operating system, all simulations run on the computer with Intel Core(TM) i3-6100 CPU @ 3.70GHz, and its memory is 8G. Twenty runs for each working accesses the predictive performances. The population and max iteration are 10 and 300, and the comparison algorithm's parameter settings are shown in Table 2.

Criteria for evaluating performance
The training error is the error between the value predicted by the model and the actual value in the training set. The Mean Square Error (MSE) is the training error for the training part. MSE is the distinction between the actual and the predicted value obtained by the training algorithm [69,70], which is widely used as a criterion [71]. And MSE is computed by Eq (28).
where m is the output numbers, d k i utilize k th training sample to get the required output value of the i th input, and o k i is actual value of the k th training sample. To ensure the fairness and effectiveness of the experiment, the average MSE (MSE) for all training samples is computed
where s is the number of training samples, the training of an MLP consists of multiple variables and functions, where MSE for the EGWO algorithm is calculated by the Eq (30). The test error is the average error of the model on the test set, which measures the model's generalization ability. In practice, the test error should be as small as possible.

Experimental results and discussion
To further analyze the variables that affect student achievement, SPSS software is used to analyze the dataset to obtain the variable importance, and the results are shown in Fig 7. According to the results, it can be seen that the importance of the selected 30 variables to the final output results is different. According to the analysis of the UCI dataset, the proportion of girls in the survey reach 53%, as shown in Fig 8(a). As seen from Fig 8(b), most students' home addresses are in cities. As shown in Fig 8(c), among the surveyed students, there are relatively more students with family support education. The time students spend learning also determines their degree of knowledge acquisition. According to Fig 9(a), most students study for less than two hours every week. For students, guardians have a direct impact on students' living environment and then on students' learning environment. It can be seen from Fig 9(b) that the guardians of most students are mothers, accounting for 69%. Job distribution of students' parents has displayed in Fig 10, and other jobs account for a large proportion. In addition, service workers account for a large proportion, reaching 26% for father and 26% for mother. And the different importance of variables is selected to discuss the results, such as sex, home address style, family educational support, study time, and guardian and parents' job.
Based on the above factors, we conclude that EGWO-MLP is a student achievement prediction model, and 30 factors, including the above ones, are input to forecast student achievement. To verify the optimization superiority of the EGWO algorithm, it is compared with other algorithms, including Advanced Grey Wolf Optimization algorithm (AGWO) [19], Particle Swarm Optimization (PSO) [72], Genetic Algorithm (GA) [73], Bat Algorithm (BA) [74], Differential Evolution (DE) [75], and Sine cosine algorithm (SCA) [76].

PLOS ONE
"Do Not Test" occurs for a comparison when no significant difference is found between the two rank sums that enclose that comparison.

Discussion 1: Mathematics (Mat.)
For the Mathematics (Mat.) subject achievement prediction, it can be seen from Tables 3 and  4, the EGWO-MLP model among the swarm intelligence optimization algorithms can obtain the best results. It is worse than the evolutionary algorithms. However, according to the standard deviation, EGWO-MLP can get the smallest standard deviation (std) in the training error rate. Due to the strong exploration and local stagnation avoidance of the EGWO, it can effectively feedback weights and biases to predict student achievement compared with the basic GWO-MLP. To further verify the performance of the EGWO-MLP model and the difference from other algorithms-MLP models, we choose the ANOVA on the RANKS test. And the experimental results are shown in Table 5, demonstrating that EGWO-MLP is superior to AGWO-MLP, GWO-MLP, and GA-MLP models. In general, EGWO-MLP has a significant advantage in predicting student achievement.

Discussion 2: Portuguese (Por.)
For the Portuguese (Por.) subject, as shown in Tables 6 and 7, EGWO-MLP can outperform most models based on swarm intelligence optimization algorithms during model training. However, due to the unique evolutionary characteristics of evolutionary algorithms, it is difficult for EWGO-MLP to surpass its optimization model. For example, the remarkable difference strategy of DE makes it enhance the exploration ability and avoid local stagnation in the optimization process. During the testing process, it is difficult for the compared models to achieve stable model optimization, and EGWO-MLP can obtain the lowest test error and standard deviation. The experimental results of statistical tests are shown in Table 8. The experimental results show that its EGWO-MLP can outperform most of the compared models and has strong stability.
To sum up, this section selects two subjects (Mathematics (Mat.) and Portuguese (Por.)) to train and test the model. The experimental results show that EGWO-MLP is better than the

PLOS ONE
selected swarm intelligence optimization algorithm model. It is difficult to outperform the model trained by the typical evolutionary algorithm in the training process in terms of the unique evolution strategy. However, the testing process shows that EGWO-MLP is more stable and effective and can outperform the compared models. The shrinking, resilient surrounding and weighted candidate mechanisms can decide the wolf position to update the step direction and length. The operation is instrumental in accelerating the convergence speed, avoiding local stagnation, and balancing exploration and exploitation. The above advantages ensure that EGWO can effectively optimize the weights and biases of MLP to drive the EGWO-MLP model to solve high-dimensional complex problems and analyze a large amount of data. To prove the experiment's validity, the experimental results are statistically analyzed,  demonstrating that EGWO-MLP model is effective in dealing with the problem of student achievement prediction. Through the analysis of the above EGWO-MLP model, the selected thirty inputs are conducive to predicting student achievement. For the thirty variables, the weight of their importance is copied through EGWO to avoid the effect of weight assignment of objective reasons on the final prediction results.

Suggestions
Through the model construction in the above chapters and the analysis and discussion of the experimental results, the enlightenment to promote student performance and teaching effectiveness are shown in the following aspects: Firstly, with the rapid development of modern information technology, communication technology, and computer technology, database application's scope, depth and scale is expanding. Big data mining and analysis can also benefit educational institutions at all levels. Currently, the use of data stored in schools' management systems is mainly in a relatively primary stage. Generally, only simple queries and statistical tables are provided in the system, while a large amount of information affecting students' learning is not accessible. Data mining and analysis through the EGWO-MLP model can make full use of the obtained data to reveal the correlation between student performance and family and individual. And school factors, more accurately, can provide the basis for school decision-makers and help them more comprehensively monitor and regulate the factors affecting teaching quality to ensure the quality of education. Secondly, by mining the main factors affecting student achievement in a subject, the EGWO-MLP model is generated. Through the analysis of the characteristics of learners, we can understand the learning environment, cognitive factors, and learning ability of different learning individuals. On this basis, teachers can provide personalized teaching content and method according to group differences and learners' characteristics, which can give the basis for teachers to adjust teaching strategies according to students' aptitudes. This method can be applied to other subjects so that students can maintain a good learning state and improve the overall learning effect.
Thirdly, the EGWO-MLP model proposed in this paper can predict and warn students, which is conducive to assisting teachers in managing and helping students. Simultaneously, the prediction results can enhance self-learning awareness of students and improve the teaching quality of teachers. The current study can help teachers predict student achievement, reflect on teaching performance, and provide technical analysis strategies and management recommendations for high-quality training of teachers. And it can prevent teachers from selectively ignoring students with a poor foundation, resulting in the unfairness of the teaching process. Fourthly, during the epidemic period of COVID-19, online teaching is more extensive, making it difficult to ensure students' performance and effectively give guidance in study or training for students. Meanwhile, online learning achievement prediction mainly relies on structured data, which is difficult to profoundly and accurately mine learners' states, emotions, and other information, affecting prediction accuracy. Therefore, inspired by this paper, swarm intelligence technology is combined with neural networks to improve prediction accuracy and education quality.

Conclusion and future work
Under the pandemic trend of COVID-19, due to the change in curriculum arrangement and teachers' teaching methods, student academic achievement and performance have become the focus of education. To effectively manage students' learning factors and efficiently guide students in their learning, a direct and effective way is needed to predict student performance. The improvement of student performance and ability is a critical issue in education. The analysis of family features and personal characteristics on student achievement and performance finds that the case of student achievement prediction belongs to non-linear, high-dimensional, and complex practical problems. The existing NFL theory-based prediction methods fail to predict student achievement and performance, for they cannot fully cover influencing factors.
To more accurately predict student performance, this paper builds a prediction model based on MLP. MLP is a method applied to solve high-dimensional complex problems, and it has been applied to predict student achievement in precious research and educational practice. Since the MLP tends to fall into local stagnation, it is challenging to obtain the optimal solution during the optimization process, and finally obtain better optimization results. Therefore, introducing a swarm intelligence optimization algorithm optimizes the weights and biases of the MLP.
To solve the above problems and obtain better experimental results, this paper proposes an Elastic Grey Wolf Optimization algorithm (EGWO) variant of the grey wolf optimization algorithm. EGWO integrates with MLP to optimize the weights and biases to predict student achievement effectively. The contribution of the above can be summarized as follows: • Shrinking and resilient surrounding mechanisms compute the positions of the α, β, and δ wolves to enhance exploration.
• Due to the introduction of the weighted candidate mechanism, the hunting operation is not limited to the α wolf. The occurrence of candidate wolves avoids local stagnation.
• The proposal of the EGWO algorithm optimizes the weights and biases of the MLP to obtain an accurate prediction value.
• The EGWO-MLP model is proposed to predict student achievement with reduced test error.
It can fully mine data information and make full use of data information for prediction.
To Experiments show that artificial neural network has certain advantages in student performance prediction, which can effectively manage and cultivate students. However, some limitation influences the prediction accuracy, such as the amount of data and feature attributes in existence referencing the UCI data. In the future, Convolutional Neural Networks (CNN) and Long short-term memory (STLM) can be selected to predict student performance according to the timeline. At the same time, more effective swarm intelligence technology can be chosen to optimize the neural network structure and adjust parameters to improve prediction accuracy.

Author Contributions
Funding acquisition: Yinqiu Song.